19 research outputs found

    Language (Technology) is Power: A Critical Survey of "Bias" in NLP

    Full text link
    We survey 146 papers analyzing "bias" in NLP systems, finding that their motivations are often vague, inconsistent, and lacking in normative reasoning, despite the fact that analyzing "bias" is an inherently normative process. We further find that these papers' proposed quantitative techniques for measuring or mitigating "bias" are poorly matched to their motivations and do not engage with the relevant literature outside of NLP. Based on these findings, we describe the beginnings of a path forward by proposing three recommendations that should guide work analyzing "bias" in NLP systems. These recommendations rest on a greater recognition of the relationships between language and social hierarchies, encouraging researchers and practitioners to articulate their conceptualizations of "bias"---i.e., what kinds of system behaviors are harmful, in what ways, to whom, and why, as well as the normative reasoning underlying these statements---and to center work around the lived experiences of members of communities affected by NLP systems, while interrogating and reimagining the power relations between technologists and such communities

    "One-size-fits-all"? Observations and Expectations of NLG Systems Across Identity-Related Language Features

    Full text link
    Fairness-related assumptions about what constitutes appropriate NLG system behaviors range from invariance, where systems are expected to respond identically to social groups, to adaptation, where responses should instead vary across them. We design and conduct five case studies, in which we perturb different types of identity-related language features (names, roles, locations, dialect, and style) in NLG system inputs to illuminate tensions around invariance and adaptation. We outline people's expectations of system behaviors, and surface potential caveats of these two contrasting yet commonly-held assumptions. We find that motivations for adaptation include social norms, cultural differences, feature-specific information, and accommodation; motivations for invariance include perspectives that favor prescriptivism, view adaptation as unnecessary or too difficult for NLG systems to do appropriately, and are wary of false assumptions. Our findings highlight open challenges around defining what constitutes fair NLG system behavior.Comment: 36 pages, 24 figure

    Deconstructing NLG Evaluation: Evaluation Practices, Assumptions, and Their Implications

    Full text link
    There are many ways to express similar things in text, which makes evaluating natural language generation (NLG) systems difficult. Compounding this difficulty is the need to assess varying quality criteria depending on the deployment setting. While the landscape of NLG evaluation has been well-mapped, practitioners' goals, assumptions, and constraints -- which inform decisions about what, when, and how to evaluate -- are often partially or implicitly stated, or not stated at all. Combining a formative semi-structured interview study of NLG practitioners (N=18) with a survey study of a broader sample of practitioners (N=61), we surface goals, community practices, assumptions, and constraints that shape NLG evaluations, examining their implications and how they embody ethical considerations.Comment: Camera Ready for NAACL 2022 (Main Conference

    How to write a bias statement:Recommendations for submissions to the Workshop on Gender Bias in NLP

    No full text
    At the Workshop on Gender Bias in NLP (GeBNLP), we'd like to encourage authors to give explicit consideration to the wider aspects of bias and its social implications. For the 2020 edition of the workshop, we therefore requested that all authors include an explicit bias statement in their work to clarify how their work relates to the social context in which NLP systems are used. The programme committee of the workshops included a number of reviewers with a background in the humanities and social sciences, in addition to NLP experts doing the bulk of the reviewing. Each paper was assigned one of those reviewers, and they were asked to pay specific attention to the provided bias statements in their reviews. This initiative was well received by the authors who submitted papers to the workshop, several of whom said they received useful suggestions and literature hints from the bias reviewers. We are therefore planning to keep this feature of the review process in future editions of the workshop.Comment: This document was originally published as a blog post on the web site of GeBNLP 202

    Taxonomizing and Measuring Representational Harms: A Look at Image Tagging

    No full text
    In this paper, we examine computational approaches for measuring the "fairness" of image tagging systems, finding that they cluster into five distinct categories, each with its own analytic foundation. We also identify a range of normative concerns that are often collapsed under the terms "unfairness," "bias," or even "discrimination" when discussing problematic cases of image tagging. Specifically, we identify four types of representational harms that can be caused by image tagging systems, providing concrete examples of each. We then consider how different computational measurement approaches map to each of these types, demonstrating that there is not a one-to-one mapping. Our findings emphasize that no single measurement approach will be definitive and that it is not possible to infer from the use of a particular measurement approach which type of harm was intended to be measured. Lastly, equipped with this more granular understanding of the types of representational harms that can be caused by image tagging systems, we show that attempts to mitigate some of these types of harms may be in tension with one another
    corecore